*  '* 


DTIC 

electe 

OCT  2  0  1995 

F 

GRANT  NO:  DAMD17-94-J-4015 


TITLE:  Digital  Image  Database  with  Gold  Standard  and  Performance  Metrics 

for  Mammographic  Image  Analysis  Research 


PRINCIPAL  INVESTIGATOR:  Kevin  W.  Bowyer,  Ph.D. 


CONTRACTING  ORGANIZATION:  University  of  South  Florida 
~  Tampa,  Florida  33620-7900 


REPORT  DATE: 


TYPE  OF  REPORT: 


27  July  1995 


Annual  Report 


19951018  150 


PREPARED  FOR: 


U.S.  Army  Medical  Research  and  Materiel  Command 
Fort  Detrick,  Maryland  21702-5012 


DISTRIBUTION  STATEMENT: 


Approved  for  public  release; 
distribution  unlimited 


The  views,  opinions  and/or  findings  contained  m  this  report  are 
those  of  the  author (s)  and  should  not  be  construed  as  an  official 
Department  of  the  Army  position,  policy  or  decision  unless  so 
designated  by  other  documentation . 


REPORT  DOCUMENTATION  PAGE 

Form  Approved 

OMB  No.  0704-0188 

Public  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources, 
gathering  and  maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this 
collection  of  information,  including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson 
Davis  Highway,  Suite  1204,  Arlington,  VA  22202-4302,  and  to  the  Office  of  Management  and  Budget,  Paperwork  Reduction  Project  (0704-0188),  Washington,  DC  20503. 

1.  AGENCY  USE  ONLY  (Leave  blank)  2.  REPORT  DATE  3.  REPORT  TYPE  ANI 

27  July  1995  Annual  1  Jul 

0  DATES  COVERED 

94  -  30  Jun  95 

4.  TITLE  AND  SUBTITLE 

Digital  I  mage  Database  with  Gold  Standard  and  Performance 
Metrics  for  Mammographic  Image  Analysis  Research 

5.  FUNDING  NUMBERS 

DAMD17-94-J-4015 

6.  AUTHOR(S) 

Kevin  W.  Bowyer 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

University  of  South  Florida 

Tampa,  Florida  33620-7900 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING /MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

U.S.  Army  Medical  Research  and  Materiel  Command 

Fort  Detrick,  Maryland  21702-5012 

10.  SPONSORING /MONITORING 

AGENCY  REPORT  NUMBER 

11.  SUPPLEMENTARY  NOTES 

12a.  DISTRIBUTION /AVAILABILITY  STATEMENT 

Approved  for  public  release,  distribution  unlimited 

12b.  DISTRIBUTION  CODE 

13.  ABSTRACT  (Maximum  200  words) 

An  infrastructure  resource  is  being  created  for  use  by  researchers  working  on 
computerized  image  analysis  algorithms  to  aid  in  mammogram  screening.  The 
resource  will  contain  3,000  cases  of  data.  Each  case  will  consist  of  4  images 
plus  associated  data.  The  resource  will  be  available  to  the  research  community 
over  the  internet  and  on  tapes  sent  through  the  mail. 

14.  SUBJECT  TERMS 

Image  Database,  Digitized  Images,  Mammogram  Screening, 
Infrastructure  Resource  Breast  Cancer 

15.  NUMBER  OF  PAGES 

14 

16.  PRICE  CODE 

17.  SECURITY  CLASSIFICATION 

OF  REPORT 

Unclassified 

18.  SECURITY  CLASSIFICATION 

OF  THIS  PAGE 

Unclassified 

19.  SECURITY  CLASSIFICATION 

OF  ABSTRACT 

Unclassified 

20.  LIMITATION  OF  ABSTRACT 

Unlimited' 

NSN  7540-01-280-5500  Standard  Form  298  (Rev.  2-89) 


Prescribed-^  ANSI  Std.  Z39-18  ^ 


GENERAL  INSTRUCTIONS  FOR  COMPLETING  SF  298 


The  Report  Documentation  Page  (RDP)  is  used  in  announcing  and  cataloging  reports.  It  is  important 
that  this  information  be  consistent  with  the  rest  of  the  report,  particularly  the  cover  and  title  page. 
Instructions  for  filling  in  each  block  of  the  form  follow.  It  is  important  to  stay  within  the  lines  to  meet 

optical  scanning  requirements. 


Block  1.  Agency  Use  Only  (Leave  blank). 


Block  2.  Report  Date.  Full  publication  date 
including  day,  month,  and  year,  if  available  (e.g.  1 
Jan  88).  Must  cite  at  least  the  year. 

Block  3.  Type  of  Report  and  Dates  Covered. 
State  whether  report  is  interim,  final,  etc.  If 
applicable,  enter  inclusive  report  dates  (e.g.  10 
Jun  87  -  30  Jun  88). 

Block  4.  Title  and  Subtitle.  A  title  is  taken  from 
the  part  of  the  report  that  provides  the  most 
meaningful  and  complete  information.  When  a 
report  is  prepared  in  more  than  one  volume, 
repeat  the  primary  title,  add  volume  number,  and 
include  subtitle  for  the  specific  volume.  On 
classified  documents  enter  the  title  classification 
in  parentheses. 

Blocks.  Funding  Numbers.  To  include  contract 
and  grant  numbers;  may  include  program 
element  number(s),  project  number(s),  task 
number(s),  and  work  unit  riumber(s).  Use  the 
following  labels: 


Contract 

PR  - 

Project 

Grant 

TA  - 

Task 

Program 

WU  - 

Work  Unit 

Element 

Accession  No 

Block  6.  Author(s).  Name(s)  of  person(s) 
responsible  for  writing  the  report,  performing 
the  research,  or  credited  with  the  content  of  the 
report.  If  editor  or  compiler,  this  should  follow 
the  name(s). 

Elock7.  Performing  Organization  Name(s)  and 


Address(es).  Self-explanatory. 

Block  8.  Performing  Organization  Report 
Number.  Enterthe  unique  alphanumeric  report 
number(s)  assigned  by  the  organization 
performing  the  report. 

Blocks.  Sponsoring/Monitorinq Agency  Name(s) 
and  Address(es).  Self-explanatory. 

Block  10.  Sponsoring/Monitorinq  Agency 
Report  Number.  (If  known) 

Block  11.  Supplementary  Notes.  Enter 
information  not  included  elsewhere  such  as: 
Prepared  in  cooperation  with...;  Trans,  of...;  To  be 
published  in....  When  a  report  is  revised,  include 
a  statement  whether  the  new  report  supersedes 
or  supplements  the  older  report. 


Block  12a.  Distribution/Availability  Statement. 
Denotes  public  availability  or  limitations.  Cite  any 
availability  to  the  public.  Enter  additional 
limitations  or  special  markings  in  all  capitals  (e.g. 
NOFORN,  REL,  ITAR). 

DOD  -  See  DoDD  5230.24,  "Distribution 
Statements  on  Technical 
Documents." 

DOE  -  See  authorities. 

NASA-  See  Handbook  NHB  2200.2. 

NTIS  -  Leave  blank. 


Block  12b.  Distribution  Code. 


Leave  blank. 

Enter  DOE  distribution  categories 
from  the  Standard  Distribution  for 
Unclassified  Scientific  and  Technical 
Reports. 

Leave  blank. 

Leave  blank. 


NASA 

NTIS 


Block  13.  Abstract.  Include  a  brief  (Maximum 
200  words)  factual  summary  of  the  most 
significant  information  contained  in  the  report. 

Block  14.  Subject  Terms.  Keywords  or  phrases 
identifying  major  subjects  in  the  report. 

Block  15.  Number  of  Pages.  Enterthe  total 
number  of  pages. 

Block  16.  Price  Code.  Enter  appropriate  price 
code  (NTIS  only). 

Blocks  17.  - 19.  Security  Classifications.  Self- 
explanatory.  Enter  U.S.  Security  Classification  in 
accordance  with  U.S.  Security  Regulations  (i.e., 
UNCLASSIFIED).  If  form  contains  classified 
information,  stamp  classification  on  the  top  and 
bottom  of  the  page. 

Block  20.  Limitation  of  Abstract.  This  block  must 
be  completed  to  assign  a  limitation  to  the 
abstract.  Enter  either  UL  (unlimited)  or  SAR  (same 
as  report).  An  entry  in  this  block  is  necessary  if 
the  abstract  is  to  be  limited.  If  blank,  the  abstract 
is  assumed  to  be  unlimited. 


Standard  Form  298  Back  (Rev.  2-89) 

*U.S.GPO:1 993-0-358-779 


FOREWORD 


Opinions,  interpretations,  conclusions  and  recommendations  are 

those  of  the  author  and  are  not  necessarily  endorsed  by 
Army. 

Where  copyrighted  material  is  quoted,  permission  has  been 
obtained  to  use  such  material. 

Where  material  from  documents  designated  for  limited 
distribution  is  quoted,  permission  has  been  obtained  to  use  the 

material . 

Citations  of  commercial  organizations  and  trade  names  in 
this  report  do  not  constitute  an  official  Department  of  ? 
endorsement  or  approval  of  the  products  or  services  of  these 
organizations . 

In  conducting  research  using  animals,  the  investigator (s) 
adhered  to  the  "Guide  for  the  Care  and  Use  of  t 

Animals,"  prepared  by  the  Committee  on  Care  and  Use  ^ 

Animals  of  the  Institute  of  Laboratory  Resources,  National 
Research  Council  (NIH  Publication  No.  86-23,  Revised  1985). 

For  the  protection  of  human  subjects,  the  investigator (s) 
idhired  to  policies  of  applicable  Federal  Law  45  CFR  46. 

in  conducting  research  utilizing  recombinant  DNA  technology, 
the- investigator (s)  adhered  to  current  guidelines  promulgated  by 
the  National  Institutes  of  Health. 

In  the  conduct  of  research  utilizing  recombinant  DNA,  the 
Investigator ( s )  adhered  to  the  NIH  Guidelines  for  Research 
Involving  Recombinant  DNA  Molecules • 

In  the  conduct  of  research  involving  hazardous  organisms, 
the- investigator (s)  adhered  to  the  CDC-NIH  Guide  for  Biosafety  in 
Microbiological  and  Biomedical  Laboratories. 


Accesion  For  . 

NTIS  CRA&I 
DTIC  TAB 


f\  1 1  r\o^~  1-^. 


Date 


Contents 


1  Introduction  5 

2  Body  5 

2.1  Database  organized  as  cases .  5 

2.2  Distribution  of  cases  across  five  highest-level  categories  .  .  .  . .  5 

2.3  Definition  of  “clearly  normal” .  6 

2.3.1  Breakdown  of  “clearly  normal”  by  ACR  density  rating .  6 

2.4  Definition  of  “normal  after  recall” . 6 

2.5  Definition  of  “abnormal  -  benign” .  6 

2.6  Definition  of  “abnormal  -  cancer” .  7 

2.7  Definition  of  “false  negative” .  7 

2.8  Selection  of  cases  “in  sequence” .  7 

3  Digitization  of  films  7 

3.1  Spatial  and  intensity  resolution .  7 

3.2  Additional  non- image  information .  8 

4  Radiologist  annotation  of  “ground  truth”  8 

4.1  Availability  to  the  research  community .  9 

5  Conclusions  9 

6  References  10 

A  Log  of  Internet  FTP  Accesses  in  the  Past  Year  11 


4 


1  Introduction 


The  goal  of  this  project  is  to  establish  a  database  for  use  by  the  digital  mammographic 
image  analysis  research  community.  The  primary  purpose  of  the  database  is  to  facilitate  sound 
experimental  research  in  the  development  of  computer  algorithms  to  aid  in  screening.  The 
database  will  eventually  contain  approximately  3,000  cases.  Each  case  will  include  the  standard 
two  images  of  each  breast,  meaning  a  total  of  12,000  individual  images.  Along  with  the  images, 
each  case  will  contain  some  associated  patient  information  and  specification  of  parameters  of 
the  image  acquisition  and  digitization  process. 

Previously,  most  research  on  computer  image  analysis  for  mammogram  screening  has  used 
a  “small”  (10s  to  perhaps  100)  number  of  images.  Also,  researchers  have  generally  not  been 
able  to  evaluate  their  work  using  the  same  images  used  by  other  researchers.  The  infrastructure 
resource  created  through  this  project  should  address  both  of  these  problems. 


2  Body 

This  section  outlines  the  conceptual  organization  of  the  database  in  terms  relevant  to  the  context 
of  a  screening  program.  Details  of  the  particular  file  format  and  storage  media  will  evolve  over 
the  course  of  the  project  and  are  not  discussed  in  detail  here. 

2.1  Database  organized  as  cases 

At  the  highest  level,  the  database  is  organized  as  a  set  of  approximately  3,000  individual  cases. 
A  “case”  is  defined  as  a  standard  screening  exam  of  two  images  of  each  breast,  plus  selected 
additional  non-image  information. 

The  cases  will  be  divided  across  five  broad  categories  of  result,  as  defined  immediately  below. 
To  guard  against  inadvertent  bias  in  selection  of  cases,  the  majority  of  the  cases  will  be  selected 
“in  sequence.”  See  section  2.8  for  the  definition  of  “in  sequence” . 

2.2  Distribution  of  cases  across  five  highest- level  categories 

The  five  categories  of  cases  are:  (1)  clearly  normal ,  (2)  normal  after  recall,  (3)  abnormal 
-  benign ,  (4)  abnormal  -  cancer  and  (5)  false  negative.  This  categorization  is  chosen  to  be 
relevant  in  the  context  of  a  screening  program.  The  planned  approximate  number  of  cases  in 
each  category  is  as  follows: 


clearly  normal 

800 

normal  after  recall 

200 

abnormal  -  benign 

1,000 

abnormal  -  cancer 

1,000 

false  negative 

«  10 
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2.3  Definition  of  “clearly  normal” 

This  category  is  defined  as  cases  presented  for  screening  which: 

•  were  read  as  normal  from  the  standard  screening  exam  of  two  views  of  each  breast,  and 

•  had  a  subsequent  exam  four  years  later  (plus/minus  six  months)  which  was  also  read  as 
normal  with  no  more  work-up  than  additional  views,  and 

•  for  which  there  is  no  clinical  evidence  of  malignancy. 

Note  that  the  early  case,  not  the  follow-up  case,  is  the  “clearly  normal”  case  that  goes  into  the 
database.  A  case  which  falls  into  this  category  has,  by  this  definition,  no  suspicious  region  in 
any  of  the  four  images. 

2.3.1  Breakdown  of  “clearly  normal”  by  ACR  density  rating 

The  American  College  of  Radiology  “BI-RADS”  terminology  specifies  a  rating  of  breast  density 
on  a  scale  of  1  (“almost  entirely  fat”)  to  4  (“extremely  dense”)  [1].  Each  of  the  cases  in  the 
database  will  be  accompanied  by  a  value  for  the  breast  density  rating,  as  assigned  by  an  expert 
mammographer.  The  breast  density  rating  is  assumed  to  be  the  same  for  all  four  images  in  a 
given  study.  (Note  -  all  of  the  standard  caveats  about  observer  variability  apply  to  the  breast 
density  ratings  assigned  to  the  cases.) 

The  four  categories  of  breast  density  do  not  occur  with  equal  frequency  in  the  typical  screen¬ 
ing  population  -  the  highest  density  rating  occurs  least  frequently.  Also,  It  is  generally  believed 
that  a  higher  breast  density  rating  presents  a  greater  challenge  for  correct  interpretation.  So 
that  there  is  a  sufficient  number  of  the  highest-density  breasts  to  allow  construction  of  reli¬ 
able  classifiers,  each  breast  density  subcategory  will  be  represented  by  at  least  15%  of  the  total 
“clearly  normal”  cases. 

2.4  Definition  of  “normal  after  recall” 

This  category  is  defined  as  cases  presented  for  screening  which: 

•  were  read  as  normal  only  after  the  reading  of  additional  views  beyond  the  standard  screen¬ 
ing  exam,  and 

•  had  no  need  of  follow-up  other  than  the  additional  views,  and 

•  have  had  at  least  four  years  of  subsequent  negative  screening  exams. 

A  case  which  falls  into  this  category  has,  by  this  definition,  at  least  one  suspicious  region  in 
at  least  one  image.  Each  image  which  contains  a  suspicious  region  has  an  associated  “overlay” 
which  records  the  location  and  type  of  the  region. 

2.5  Definition  of  “abnormal  —  benign” 

This  category  is  defined  as  cases  presented  for  screening  which  contained  a  suspicious  area 
that  was  determined  to  be  benign  on  the  basis  of  (a)  biopsy,  or  (b)  the  clear  demonstration  of 
a  cyst  by  ultrasound  or  aspiration.  A  case  in  this  category  has  at  least  one  suspicious  region  in 
at  least  one  image,  and  each  such  image  has  an  associated  overlay  file. 
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2.6  Definition  of  “abnormal  —  cancer” 

This  category  is  defined  as  cases  presented  for  screening  which  contained  a  suspicious  area 
that  was  determined  to  be  cancer  on  the  basis  of  biopsy.  A  case  in  this  category  has  at  least 
one  suspicious  region  in  at  least  one  image,  and  each  such  image  has  an  associated  overlay  file. 

2.7  Definition  of  “false  negative” 

This  category  is  defined  as  cases  presented  for  screening  which  were  initially  read  as  clearly 
normal  or  as  normal  after  recall ,  but  were  later  determined  to  have  a  cancer  present.  This 
category  is  further  sub-divided  into  cancers  which  are  (a)  “clear  in  retrospect,”  and  (b)  “not 
clear  in  retrospect.”  The  images  for  a  case  in  this  category  may  or  may  not  have  an  associated 
overlay  file. 


2.8  Selection  of  cases  “in  sequence” 

In  order  to  avoid  any  inadvertent  selection  bias,  and  to  try  to  at  least  partly  reflect  the 
natural  variability  seen  in  a  screening  program,  the  first  80%  of  the  cases  in  each  (sub)category 
as  described  above  will  be  taken  “in  sequence”  from  the  stream  of  cases  at  an  institution.  This 
means  (a)  choosing  a  time  period,  (b)  considering  each  case  in  sequence  in  that  time  period, 
and  (c)  accumulating  each  case  into  the  appropriate  (sub)category  until  the  required  number 
has  been  met. 

As  an  example,  consider  the  selection  of  “clearly  normal”  cases  at  a  given  institution.  Say 
that  the  selection  period  begins  June  1, 1995  and  runs  until  sufficient  cases  are  acquired.  Starting 
with  June  1,  each  case  that  is  read  as  clearly  normal  is  checked  to  see  if  there  was  a  study  done 
in  the  window  of  42  to  54  months  previous  (4  years  +/-  6  months).  If  such  a  previous  study 
exists,  and  it  was  read  as  clearly  normal,  and  it  is  of  reasonable  technical  quality,  then  that 
study  is  accumulated  for  the  database. 

Once  80%  of  the  planned  number  of  cases  in  a  (sub)category  are  accumulated,  the  character¬ 
istics  of  the  cases  accumulated  to  that  point  will  be  reviewed.  If  necessary,  the  remaining  20% 
of  the  cases  may  be  targeted  to  exhibit  characteristics  which  are  thought  to  be  important  but 
which  seem  under-represented  in  the  first  80%.  The  sequence/selected  status  of  a  given  case  is 
part  of  the  associated  non-image  information  for  that  case. 


3  Digitization  of  films 

Cases  selected  for  the  database  will  have  the  original  films  digitized  according  to  the  following 
specifications. 

3.1  Spatial  and  intensity  resolution 

Based  on  current  knowledge,  available  technology,  and  budget  constraints,  the  decision  was 
made  to  begin  accumulating  studies  for  the  database  using  a  digitizer  capable  of  a  spatial 
resolution  of  21  microns,  with  an  intensity  resolution  of  16  bits. 


7 


3.2  Additional  non- image  information 

Each  case  will  contain  the  following  additional  non-image  information: 

•  date  that  the  mammography  exam  was  performed. 

•  age  of  patient  at  the  time  of  the  exam. 

•  film  manufacturer,  film  type,  and  film  processing  (extended/regular). 

•  BI-RADS  breast  density  rating  (1,  2,  3,  or  4). 

•  date  that  the  films  were  digitized. 

•  institution  at  which  the  exam  was  performed  (A,  B,  C,  ...). 

•  file  name. 

•  number  of  lines  per  image. 

•  number  of  pixels  per  line. 

•  number  of  bytes  per  pixel. 

•  information  on  suspicious  regions  in  each  image: 

-  location. 

-  rating  of  subtlety  on  the  following  scale: 

5  obvious. 

4  detectable  by  an  unsophisticated  medical  person. 

3  detectable  by  a  competent,  ACR-accredited  physician. 

2  reasonably  likely  to  be  detected  by  an  expert. 

1  reasonably  unlikely  to  be  detected  by  an  expert. 

0  completely  occult  -  any  expert  would  miss  it. 

-  benign  /  malignant  indication,  and  type  of  pathology  if  malignant. 

-  screen  detected  but  visible  previously  /  screen  detected  /  visible  de  nova. 

The  motivation  for  the  rating  of  subtlety  is  that  a  bar  chart  of  the  ratings  of  the  cases,  or 
selected  subsets  of  the  cases,  can  be  used  as  a  qualitative  indication  of  the  overall  difficulty  of 
the  cases  considered. 


4  Radiologist  annotation  of  “ground  truth” 

Each  case  falling  in  any  category  other  than  clearly  normal  will  have  an  “overlay”  associated 
with  at  least  one  of  the  images.  The  overlay  will  specify  “ground  truth”  information  about  the 
locations  and  types  of  suspicious  regions  in  the  image. 

The  information  in  the  overlay  originates  from  an  expert  radiologist  marking  on  the  non¬ 
emulsifier  side  of  the  film  with  a  narrow- tip  grease  pencil.  A  film  containing  a  suspicious  region 
is  first  marked  by  the  radiologist,  the  digitized,  then  cleaned  and  digitized  again. 

The  following  procedures  are  followed  for  marking  the  outline  of  a  suspicious  region  on  the 
film.  In  all  cases,  the  markings  are  acknowledged  to  be  approximate,  based  on  best  judgment 
from  the  information  presented  in  the  image. 
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non-specific  suspicious  region.  The  radiologist  will  mark  the  outline  of  the  suspicious  region 
as  a  single  closed  curve. 

circumscribed  lesion.  The  radiologist  will  mark  the  outline  of  the  lesion  as  a  single  closed 
curve. 

cluster  of  microcalcifications.  The  radiologist  will  mark  the  outline  of  the  cluster  as  a  whole 
as  a  single  closed  curve.  In  addition,  3  or  more  of  the  calcifications  in  each  cluster  will  be 
indicated  individually  by  an  arrow  pointing  to  each. 

spiculated  lesion.  The  radiologist  will  use  whichever  of  two  approaches  they  deem  most 
suitable  to  the  particular  lesion.  (1)  The  border  of  the  lesion  will  be  marked  as  a  single  closed 
curve  intended  to  capture  the  rough  shape  of  the  lesion;  the  result  will  include  major  spicules, 
but  not  necessarily  all  spicules,  and  may  include  some  tissue  that  is  not  part  of  the  lesion.  (2) 
The  central  mass  of  the  lesion  will  be  drawn  as  a  single  closed  curve  and  major  spicules  or  groups 
of  spicules  will  be  marked  as  single  lines  running  down  the  rough  center  of  the  spicule, 
architectural  distortions  and  asymmetries.  The  radiologist  will  draw  a  closed-curve  outline 
around  the  border  of  the  suspicious  region,  intended  to  capture  the  dominant  shape  of  the 
suspicious  area  but  perhaps  not  its  fine  detail. 

Note  —  In  order  to  acquire  experience  with  the  simpler  tasks  first,  the  accumulation  of  studies 
is  beginning  with  “clearly  normal”  cases.  The  details  of  the  approach  outlined  in  this  section 
may  change  prior  to  other  categories  of  cases  being  accumulated  for  the  database. 


4.1  Availability  to  the  research  community 

The  first  release  of  data  acquired  as  part  of  this  project  should  become  generally  available  to 
the  research  community  in  the  near  future.  We  will  initially  support  access  to  the  database  via 
ftp  over  the  internet  and  via  8  mm  “exabyte”  tapes  sent  through  the  mail.  We  expect  to  expand 
to  handle  additional  tape  formats  and  optical  disk  as  technology  progresses.  We  are  continuing 
to  provide  ftp  access  to  an  existing  database  of  images  provided  by  Nico  Karssemeijer  and  used 
in  his  published  work  on  algorithms  for  the  detection  of  microcalcifications  [2,3].  The  Appendix 
provides  a  log  of  ftp  accesses  to  this  data  during  the  previous  year. 

5  Conclusions 

The  use  of  a  large,  common  database  of  high  quality  mammogram  images  with  radiologist- 
specified  ground  truth  should  improve  the  quality  and  speed  the  progress  of  research  in  com¬ 
puter  image  analysis  as  an  aid  to  screening.  We  are  in  the  first  year  of  a  four-year  project  to 
establish  such  a  database.  The  digitizer  has  been  acquired  and  installed  at  Massachusetts  Gen¬ 
eral  Hospital.  Images  are  being  transferred  to  the  University  of  South  Florida.  Details  of  file 
formats,  compression  and  other  considerations  are  being  finalized  prior  to  making  initial  data 
from  this  project  available  to  the  community.  Even  though  data  acquired  under  this  project  is 
not  yet  available  to  the  research  community,  nearly  70  distinct  login  ids  initiated  ftp  sessions 
with  the  USF  site  in  the  last  year  for  purposes  of  transferring  (older)  mammogram  images. 
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A  Log  of  Internet  FTP  Accesses  in  the  Past  Year 

The  following  is  an  abstract  from  the  log  of  ftp  accesses  made  to  the  address  f  igment .  csee .  usf .  edu 
for  the  directory  pub/mammograms  during  the  period  1  July  1994  through  30  June  1995.  All  but 
the  first  entry  for  any  given  login  id  have  been  deleted.  (In  many  cases,  one  login  id  accounted 
for  many  distinct  login  sessions  at  different  times  during  the  year,  and  some  sessions  resulted  in 
as  many  as  100  transaction  entries  in  the  log  file.)  The  internet  addresses  in  the  log  file  reveal 
a  wide  breadth  of  military,  commercial,  government  and  university  institutions  in  the  United 
States,  as  well  as  many  accesses  from  outside  the  US. 

1.  Mon  Aug  8  11:30:20  1994  2146  kodaki.kodak.com  3647059 

/pub/mammograms/nijmegen-images/cl7c.ima.Z  b  _  o  a  _/05Hu@ 

2.  Mon  Aug  8  12:25:33  1994  456  kodaki.kodak.com  1522729 

/pub/mammograms/nijmegen-images/cl6c.ima.Z  b  _  o  a  pawlicki@kodak.com 

3.  Mon  Aug  15  15:02:46  1994  1  fugu.Colorado.EDU  2882 

/pub/mammograms/nijmegen-images/ReadMe.2  a  _  o  a  sharpe@ 

4.  Tue  Aug  16  11:37:45  1994  1  picard. coma.sbg.ac.at  2601 

/pub/mammograms/nijmegen-images/ReadMe.l  b  _  o  a  uhl@ 

5.  Tue  Aug  16  15:14:51  1994  1  129.92.140.47  2882 

/pub/mammograms/nijmegen-images/ReadMe.2  a  _  o  a  ckocur@afit.af.mil 

6.  Mon  Aug  22  17:36:21  1994  1  pipeline.com  2601 

/pub/mammograms/nijmegen-images/ReadMe.l  a  _  o  a  -gopher@pipel.pipeline.com 

7.  Thu  Aug  25  11:42:18  1994  17  vision.csee.Lehigh.EDU  307215 

/ pub /mammograms / registration/pairl image2 Jw.pgm  b  _  o  a  nsv2@vision.csee.lehigy.edu 

8.  Fri  Aug  26  17:36:22  1994  1  atilla.afit.af.mil  2882 

/pub/mammograms/nijmegen-images/ReadMe.2  a  _  o  a  jkelley@afit.af.mil 

9.  Mon  Aug  29  10:32:12  1994  1  maze.ruca.ua.ac.be  2601 

/pub/mammograms/nijmegen-images/ReadMe.l  b  _  o  a  meersman@ 

10.  Mon  Aug  29  22:58:43  1994  7  zeus.ee.uwa.edu.au  2882 

/pub/mammograms/nijmegen-images/ReadMe.2  a  _  o  a  chandra@ee.uwa.edu.au 

11.  Sat  Sep  3  12:55:51  1994  12  eustis.cs.ucf.edu  792725 

/pub/mammograms/nijmegen-images/cl9c.ima.Z  b  _  o  a  calin@cs.ucf.edu 

12.  Tue  Sep  13  08:03:13  1994  1  boifzc.cineca.it  2049 

/pub /mammograms /announce. ascii  b  _  o  a  zannoni@boifcc.cineca.it 

13.  Thu  Sep  15  10:55:01  1994  15  crdras.GE.COM  82251 

/pub/mammograms/registration/for_malek/LCC.pgm.Z  b  _  o  a  rjmitchell@crd.ge.com 

14.  Thu  Sep  15  22:05:15  1994  1  palomar.ecn.purdue.edu  5248 

/pub/mammograms/nijmegen-images/cl7c.lab.Z  b  _  o  a  chuang@ 

15.  Tue  Sep  20  11:44:12  1994  1  alpha3.rad.med.umich.edu  2049 

/pub/mammograms/announce. ascii  a  _  o  a  wei@alpha3.rad.med.umich.edu 

16.  Thu  Sep  22  23:28:39  1994  1  banner.ecn.purdue.edu  2601 

/pub/mammograms/nijmegen-images/ReadMe.l  a  _  o  a  ke@purdue 
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17.  Mon  Sep  26  10:30:30  1994  1  kodaki.kodak.com  2049 

/pub/mammograms/announce. ascii  b  _  o  a  flure@kodak.com 

18.  Fri  Sep  30  19:48:54  1994  1  grad  2601 

/pub/mammograms/nijmegen-images/ReadMe.l  a  _  o  a  freeman@grad.csee.usf.edu 

19.  Mon  Oct  3  19:23:14  1994  1  garlic.ece.utexas.edu  2601 

/pub/mammograms/nijmegen-images/ReadMe.l  b  _o  a  chris@ 

20.  Fri  Oct  21  06:52:29  1994  1  143.50.41.3  2601 

/pub/mammograms/nijmegen-images/ReadMe.l  a  _  o  a  graif@bkfug.kfunigraz.ac.at 

21.  Thu  Nov  3  14:46:37  1994  1  ultb-gw.isc.rit.edu  2049 

/pub/mammograms/announce. ascii  a  _  o  ajoc8048@ultb.isc.rit.edu 

22.  Thu  Nov  3  15:19:24  1994  6  fractal.ee.rochester.edu  2049 

/pub/mammograms/announce. ascii  a  _  o  a  cchen@ee.rochester.edu 

23.  Mon  Nov  7  16:56:14  1994  1  freud.ul.rp.CSIRO.AU  2601 

/pub/mammograms/nijmegen-images/ReadMe.l  a  _  o  a  ht@ul.rp.csiro.au 

24.  Wed  Nov  9  05:30:44  1994  1  skopelos.csi.forth.gr  68 

/pub/mammograms/nijmegen-images/cOlc.mrk  a  _  o  a  chronaki@csi.forth.gr 

25.  Wed  Nov  9  21:56:10  1994  1  boslf.delphi.com  2049 

/pub/mammograms/ announce. ascii  a  _  o  a  ALLAN79@DELPHI.COM 

26.  Thu  Nov  17  12:14:31  1994  8333  skopelos.csi.forth.gr  1007616 

/pub/mammograms/nijmegen-images/cOlc.ima.Z  b  _  o  a  telemed@doris.forth.gr 

27.  Mon  Nov  21  04:51:38  1994  1  jupiter.ceng.cea.fr  2049 

/pub/mammograms/announce.ascii  a  _  o  a  dinten@dsys.ceng.cea.fr 

28.  Wed  Nov  23  09:16:50  1994  1  EESUN2.TAMU.EDU  2601 

/pub/mammograms/nijmegen-images/ReadMe.l  a  _  o  a  shayne@tamu.edu 

29.  Fri  Nov  25  14:24:50  1994  1  xor.ece.iit.edu  2882 

/pub/mammograms/nijmegen-images/ReadMe.2  b  _  o  a  gwang@ece.iit.edu 

30.  Sat  Nov  26  11:32:03  1994  1  cortex.ama.ttuhsc.edu  68 

/pub/mammograms/nijmegen-images/cOlo.mrk  b  _  o  a  roberto@cortex 

31.  Mon  Nov  28  11:15:54  1994  158  EESUN2.TAMU.EDU  1823797 

/pub/mammograms/nijmegen-images/c02o.ima.Z  b  _  o  a  lee@tmu 

32.  Tue  Nov  29  09:59:47  1994  341  mira.cc.umanitoba.ca  1935653 

/pub/mammograms/nijmegen-images/c06o.ima.Z  b  _  o  a  luchka@ccu.umanitoba.ca 

33.  Wed  Nov  30  10:44:10  1994  1  philabs.philips.com  2601 

/pub/mammograms/nijmegen-images/ReadMe.l  b  _  o  a  msa@philabs.philips.com 

34.  Tue  Dec  6  19:04:37  1994  1  zark.maths.uts.edu.au  2601 

/pub/mammograms/nijmegen-images/ReadMe.l  a  _o  a  hung@zark.maths.uts.edu.au 

35.  Mon  Dec  12  15:10:37  1994  1  TVAX2.CDRH.FDA.GOV  2601 

/pub/mammograms/nijmegen-images/ReadMe.l  a  _  o  ajsb@tvax2.cdrh.fda.gov 

36.  Fri  Dec  30  04:07:09  1994  1  cps223.cps.cmich.edu  68 

/pub/mammograms/nijmegen-images/cOlc.mrk  a  _  o  a  wiley@ 

37.  Tue  Jan  10  03:09:37  1995  1  bern.student.uni-tuebingen.de  2049 

/pub/mammograms/announce.ascii  a  _  o  a  bernhard.karten@student.uni-tuebingen.de 
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38.  Mon  Jan  16  00:00:32  1995  1695  matisse.phys.uts.EDU.AU  1522729 

/pub/mammograms/nijmegen-images/cl6c.ima.Z  b  _  o  a  zen@maths.uts.edu.au 

39.  Thu  Jan  19  03:43:29  1995  1  caretta.engr.umbc.edu  2049 

/pub/mammograms/ announce. ascii  b  _  o  a  xuanj@engr.umbc.edu 

40.  Thu  Jan  19  04:06:58  1995  1  caretta.engr.umbc.edu  2049 

/pub/mammograms/announce. ascii  b  _  o  a  itl@engr.umbc.edu 

41.  Sun  Jan  22  22:21:19  1995  1  dc4_p22.sprint.dialup.net  2049 

/pub/mammograms /announce. ascii  b  _  o  a  achristo@inteachristo@interserv.com 

42.  Fri  Jan  27  20:28:31  1995  1  150.148.36.171  2601 

/pub/mammograms/nijmegen-images/ReadMe.l  b  _  o  a  jonathan@tvax2.cdrh.fda.gov 

43.  Tue  Jan  31  11:37:15  1995  1  garbi.upc.es  2601 

/pub/mammograms/nijmegen-images/ReadMe.l  a  _o  a  calderon@lsi.upc.es 

44.  Wed  Feb  8  18:11:03  1995  1  gandalf.cs.andrews.edu  2049 

/pub/mammograms/announce. ascii  b  _  o  a  wolfer@anerew.edu 

45.  Tue  Feb  21  21:03:26  1995  25  grad  2511773 

/pub/mammograms/nijmegen-images/c05o.ima.Z  a  _  o  a  namuduri@ 

46.  Tue  Feb  28  14:25:38  1995  340  selma.trg.SAIC.COM  1233505 

/pub/mammograms/nijmegen-images/c06c.ima.Z  b  _  o  a  edb@selma.trg.saic.com 

47.  Mon  Mar  6  20:27:56  1995  1  van-gogh.ee.ubc.ca  2601 

/pub/mammograms/nijmegen-images/ReadMe.l  a  _  o  a  sameti@ee.ubc.ca 

48.  Thu  Mar  16  17:37:52  1995  1  dsp6.eng.umd.edu  2049 

/pub/mammograms/announce.ascii  a  _  o  a  farrokhi@eng.umd.edu 

49.  Mon  Mar  27  16:07:12  1995  1  calypso.NMSU.Edu  24 

/pub/mammograms/nijmegen-images/c03o.mrk  b  _  o  a  sanaya@vrl.com 

50.  Sun  Apr  2  14:38:00  1995  1  avmac.biophysics.mcw.edu  2120 

/pub/mammograms/announce.ascii  b  _  o  a  rwcox@mcw.edu 

51.  Tue  Apr  4  04:47:55  1995  1  bimigw.kfunigraz.ac.at  2601 

/pub/mammograms/nijmegen-images/ReadMe.l  a  _  o  a  gabler@balu.kfun 

52.  Thu  Apr  6  12:59:48  1995  2  freewill.tu-graz.ac.at  4234 

/pub/mammograms/nijmegen-images/cl9c.lab.Z  b  _  o  a  mforst@sbox.tu-graz.ac.at 

53.  Tue  Apr  11  15:06:48  1995  67  waterfall  1799397 

/pub/mammograms/nijmegen-images/c01c.ima.Z  b  _  o  a  hall@ 

54.  Thu  Apr  13  20:35:56  1995  140  kirchoff.ee.rochester.edu  1799397 

/pub/mammograms/nijmegen-images/cOlc.ima.Z  b  _  o  a  nasan@ 

55.  Fri  Apr  14  01:33:53  1995  29  scheifler  1110367 

/pub/mammograms/nijmegen-images/c02c.ima.Z  b  _  o  a  rashedi@ 

56.  Tue  Apr  18  11:47:50  1995  1  MAMMO.PNDR.UPENN.EDU  2601 

/pub/mammograms/nijmegen-images/ReadMe.l  a  _  o  a  toto@mammo.pndr.upenn.edu 

57.  Tue  Apr  18  14:56:16  1995  1  tilki4.nswc.navy.mil  2120 

/pub/mammograms/announce.ascii  b  _  o  a  dmarche@tilki4.nswc.navy.mil 

58.  Thu  Apr  20  21:11:56  1995  270  kurtz.eee.utas.edu.au  1153589 

/pub/mammograms/nijmegen-images/cl9o.ima.Z  b  _  o  a  H.Talhami@eee.utas.edu.au 
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59.  Tue  Apr  25  10:50:14  1995  1  131.174.82.77  4171 

/pub/mammograms/nijmegen-images/cOlc.lab.Z  b  _  o  a  nico@mbfys.kun.nl 

60.  Wed  May  17  15:42:35  1995  1  zztop.eng.usf.edu  3472 

/pub/mammograms/nijmegen-images/cOlc.lut.Z  b  _  o  a  naiyer@ 

61.  Thu  May  18  00:20:06  1995  1  dixie.cs.ubc.ca  4171 

/pub/mammograms/nijmegen-images/cOlc.lab.Z  b  _  o  a  bandari@ 

62.  Tue  May  30  10:09:52  1995  1  cnea.edu.ar  2717 

/pub/mammograms/nijmegen-images/ReadMe.l  a  _  o  a  gameor@cmea.edu. ar 

63.  Wed  Jun  7  10:28:04  1995  1  tkl3.oulu.fi  2717 

/pub/mammograms/nijmegen-images/ReadMe.l  b  _  o  a  hannu@ee.oulu.fi 

64.  Sat  Jun  17  23:53:39  1995  1  grad  3472 

/pub/mammograms/nijmegen-images/cOlc.lut.Z  b  _  o  a  henrique@ 

65.  Sun  Jun  18  11:40:14  1995  24  paris.eng.utsa.edu  194077 

/pub/mammography _papers/mic92.ps.Z  b  _  o  a  yzhou@ 

66.  Tue  Jun  27  15:43:28  1995  1  iris.stsci.edu  2120 

/pub/mammograms /announce. ascii  a  _  o  a  hanisch@stsci.edu 

67.  Thu  Jun  29  14:08:22  1995  1  lab-pc-61.bus.umich.edu  4213 

/pub/mammograms/nijmegen-images/cOlo.lab.Z  b  _  o  a  Netscape@lab-pc-61.bus.umich.edu 

68.  Fri  Jun  30  17:44:07  1995  1  sunflash  4171 

/pub/mammograms/nijmegen-images/cOlc.lab.Z  b  _  o  a  weyron@ 


